13 research outputs found

    Parametric spatial audio processing utilising compact microphone arrays

    Get PDF
    This dissertation focuses on the development of novel parametric spatial audio techniques using compact microphone arrays. Compact arrays are of special interest since they can be adapted to fit in portable devices, opening the possibility of exploiting the potential of immersive spatial audio algorithms in our daily lives. The techniques developed in this thesis consider the use of signal processing algorithms adapted for human listeners, thus exploiting the capabilities and limitations of human spatial hearing. The findings of this research are in the following three areas of spatial audio processing: directional filtering, spatial audio reproduction, and direction of arrival estimation.  In directional filtering, two novel algorithms have been developed based on the cross-pattern coherence (CroPaC). The method essentially exploits the directional response of two different types of beamformers by using their cross-spectrum to estimate a soft masker. The soft masker provides a probability-like parameter that indicates whether there is sound present in specific directions. It is then used as a post-filter to provide further suppression of directionally distributed noise at the output of a beamformer. The performance of these algorithms represent a significant improvement over previous state-of-the-art methods.  In parametric spatial audio reproduction, an algorithm is developed for multi-channel loudspeaker and headphone rendering. Current limitations in spatial audio reproduction are related to high inter-channel coherence between the channels, which is common in signal-independent systems, or time-frequency artefacts in parametric systems. The developed algorithm focuses on solving these limitations by utilising two sets of beamformers. The first set of beamformers, namely analysis beamformers, is used to estimate a set of perceptually-relevant sound-field parameters, such as the separate channel energies, inter-channel time differences and inter-channel coherences of the target-output-setup signals. The directionality of the analysis beamformers is defined so that it follows that of typical loudspeaker panning functions and, for headphone reproduction, that of the head-related transfer functions (HRTFs). The directionality of the second set of high audio quality beamformers is then enhanced with the parametric information derived from the analysis beamformers. Listening tests confirm the perceptual benefit of such type of processing. In direction of arrival (DOA) estimation, histogram analysis of beamforming and active intensity based DOA estimators has been proposed. Numerical simulations and experiments with prototype and commercial microphone arrays show that the accuracy of DOA estimation is improved

    Parametric first-order ambisonic decoding for headphones utilising the cross-pattern coherence algorithm

    Get PDF
    International audienceRegarding the reproduction of recorded or synthesised spatial sound scenes, perhaps the most convenient and flexible approach is to employ the Ambisonics framework. The Ambisonics framework allows for linear and non-parametric storage, manipulation and reproduction of sound-fields, described using spherical harmonics up to a given order of expansion. Binaural Ambisonic reproduction can be realised by matching the spherical harmonic patterns to a set of binaural filters, in manner which is frequency-dependent, linear and time-invariant. However, the perceptual performance of this approach is largely dependent on the spatial resolution of the input format. When employing lower-order material as input, perceptual deficiencies may easily occur, such as poor localisation accuracy and colouration. This is especially problematic, as the vast majority of existing Ambisonic recordings are often made available as first-order only. The detrimental effects associated with lower-order Ambisonics reproduction have been well studied and documented. To improve upon the perceived spatial accuracy of the method, the simplest solution is to increase the spherical harmonic order at the recording stage. However, microphone arrays capable of capturing higher-order components, are generally much more expensive than first-order arrays; while more affordable options tend to offer higher-order components only at limited frequency ranges. Additionally, an increase in spherical harmonic order also requires an increase in the number of channels and storage, and in the case of transmission, more bandwidth is needed. Furthermore, it is important to note that this solution does not aid in the reproduction of existing lower-order recordings. It is for these reasons that this work focuses on alternative methods which improve the reproduction of first-order material for headphone playback. For the task of binaural sound-field reproduction, an alternative is to employ a parametric approach, which divides the sound-field decoding into analysis and synthesis stages. Unlike Ambisonic reproduction, which operates via a linear combination of the input signals, parametric approaches operate in the time-frequency domain and rely on the extraction of spatial parameters during their analysis stage. These spatial parameters are then utilised to conduct a more informed reproduction in the synthesis stage. Parametric methods are capable of reproducing sounds at a spatial resolution that far exceeds their linear and time-invariant counterparts, as they are not bounded by the resolution of the input format. For example, they can elect to directly convolve the analysed source signals with Head-Related Transfer Functions (HRTF), which correspond to their analysed directions. An infinite order of spherical harmonic components would be required to attain the same resolution with a binaural Ambisonic decoder. The most well-known and established parametric reproduction method is Directional Audio Coding (DirAC), which employs a sound-field model consisting of one plane-wave and one diffuseness estimate per time-frequency tile. These parameters are derived from the active-intensity vector, in the case of first-order input. More recent formulations allow for multiple plane-wave and diffuseness estimates via spatially-localised active-intensity vectors, using higher-order input. Another parametric method is High Angular Resolution plane-wave Expansion (HARPEX), which extracts two plane-waves per frequency and is first-order only. The Sparse-Recovery method extracts a number of plane-waves, which corresponds to up to half the number of input channels of arbitrary order. The COding and Multi-Parameterisation of Ambisonic Sound Scenes (COMPASS) method also extracts source components up to half the number of input channels, but employs an additional residual stream that encapsulates the remaining diffuse and ambient components in the scene. In this paper, a new binaural parametric decoder for first-order input is proposed. The method employs a sound-field model of one plane-wave and one diffuseness estimate per frequency, much like the DirAC model. However, the source component directions are identified via a plane-wave decomposition using a dense scanning grid and peak-finding, which is shown to be more robust than the active-intensity vector for multiple narrow-band sources. The source and ambient components per time-frequency tile are then segregated, and their relative energetic contributions are established, using the Cross-Pattern Coherence (CroPaC) spatial-filter. This approach is shown to be more robust than deriving this energy information from the active-intensity-based diffuseness estimates. A real-time audio plug-in implementation of the proposed approach is also described.A multiple-stimulus listening test was conducted to evaluate the perceived spatial accuracy and fidelity of the proposed method, alongside both first-order and third-order Ambisonics reproduction. The listening test results indicate that the proposed parametric decoder, using only first-order signals, is capable of delivering perceptual accuracy that matches or surpasses that of third-order ambisonics decoding

    DOA ESTIMATION WITH HISTOGRAM ANALYSIS OF SPATIALLY CONSTRAINED ACTIVE INTENSITY VECTORS

    Get PDF
    The active intensity vector (AIV) is a common descriptor of the sound field. In microphone array processing, AIV is commonly approximated with beamforming operations and uti- lized as a direction of arrival (DOA) estimator. However, in its original form, it provides inaccurate estimates in sound field conditions where coherent sound sources are simultane- ously active. In this work we utilize a higher order intensity- based DOA estimator on spatially-constrained regions (SCR) to overcome such limitations. We then apply 1-dimensional (1D) histogram processing on the noisy estimates for mul- tiple DOA estimation. The performance of the estimator is shown with a 7-channel microphone array, fitted on a rigid mobile-like device, in reverberant conditions and under dif- ferent signal-to-noise ratios

    Parametric time-frequency domain spatial audio

    No full text
    This book provides readers with the principles and best practices in spatial audio signal processing. It describes how sound fields and their perceptual attributes are captured and analyzed within the time-frequency domain, how essential representation parameters are coded, and how such signals are efficiently reproduced for practical applications. The book is split into four parts starting with an overview of the fundamentals. It then goes on to explain the reproduction of spatial sound before offering an examination of signal-dependent spatial filtering. The book finishes with coverage of both current and future applications and the direction that spatial audio research is heading in. Parametric Time-frequency Domain Spatial Audio focuses on applications in entertainment audio, including music, home cinema, and gaming--covering the capturing and reproduction of spatial sound as well as its generation, transduction, representation, transmission, and perception. This book will teach readers the tools needed for such processing, and provides an overview to existing research. It also shows recent up-to-date projects and commercial applications built on top of the systems. * Provides an in-depth presentation of the principles, past developments, state-of-the-art methods, and future research directions of spatial audio technologies * Includes contributions from leading researchers in the field * Offers MATLAB codes with selected chapters An advanced book aimed at readers who are capable of digesting mathematical expressions about digital signal processing and sound field analysis, Parametric Time-frequency Domain Spatial Audio is best suited for researchers in academia and in the audio industry.

    3D DOA ESTIMATION OF MULTIPLE SOUND SOURCES BASED ON SPATIALLY CONSTRAINED BEAMFORMING DRIVEN BY INTENSITY VECTORS

    No full text
    Sound source localization in three dimensions with micro- phone arrays is an active field of research, applicable in sound enhancement, source separation, sound field analysis and teleconferencing systems. In this contribution we pro- pose a method for three dimensional, multiple sound source localization in reverberant environments employing a spa- tially constrained steered response beamformer on the DOA estimates of the intensity vector. The method enhances signif- icantly previously proposed work by the authors which was relying solely on intensity vector estimates. Experiments are performed in both simulated and real acoustical environments with a spherical microphone array, for multiple sound sources under different reverberation and signal-to-noise ratio (SNR) conditions. The performance of the proposed method is com- pared with our previously proposed work and a subspace method in the spherical harmonic domain which utilizes the direct-path dominance test. The results demonstrate a signifi- cant improvement in terms of localization accuracy

    Real-time conversion of sensor array signals into spherical harmonic signals with applications to spatially localised sub-band sound-field analysis

    No full text
    This paper presents two real-time audio plug-ins for processing sensor array signals for sound-field visualization. The first plug-in utilizes spherical or cylindrical sensor array specifications to provide analytical spatial filters which encode the array signals into spherical harmonic signals. The second plug-in utilizes these intermediate signals to estimate the direction-of-arrival of sound sources, based on a spatially localized pressure-intensity (SLPI) approach. The challenge with the traditional pressure-intensity (PI) sound-field analysis is that it performs poorly when presented with multiple sound sources with similar spectral content. Test results indicate that the proposed SLPI approach is capable of identifying sound source directions with reduced error in various environments, when compared to the PI method

    Beamforming with a volumetric array of massless laser spark sources—Application in reflection tracking

    No full text
    A volumetric array of laser-induced air breakdown sparks is used to produce a directional and steerable acoustic source. The laser breakdown array element is broadband, point-like, and massless. It produces an impulse-like waveform in midair, thus generating accurate spatio-temporal information for acoustic beamforming. A laser-spark scanning setup and the concept of a massless steerable source are presented and evaluated with a cubic array by using an off-line far field delay-and-sum beamforming method. This virtual acoustic array with minimal source influence can, for instance, produce narrow transmission beams to obtain localized and directional impulse response information by reflection tracking.Peer reviewe

    Applications of Spatially Localized Active-Intensity Vectors for Sound-Field Visualization

    No full text
    The purpose of this article is to detail and evaluate three alternative approaches to sound-field visualization, which all employ the use of spatially localized active-intensity (SLAI) vectors. These SLAI vectors are of particular interest, as they allow direction-of-arrival (DoA) estimates to be extracted in multiple spatially localized sectors, such that a sound source present in one sector has reduced influence on the DoA estimate made in another sector. These DoA estimates may be used to visualize the sound-field by either: (I) directly depicting the estimates as icons, with their relative size dictated by the corresponding energy of each sector; (II) generating traditional activity maps via histogram analysis of the DoA estimates; or (III) by using the DoA estimates to reassign energy and subsequently sharpen traditional beamformer-based activity maps. Since the SLAI-based DoA estimates are continuous, these approaches are inherently computationally efficient, as they forego the need for dense scanning grids to attain high-resolution imaging. Simulation results also show that these SLAI-based alternatives outperform traditional active-intensity and beamformer-based approaches, for the majority of cases.Peer reviewe

    Laser-induced acoustic point source for accurate impulse response measurements within the audible bandwidth

    No full text
    Laser induced air breakdown is proposed as a sound source for accurate impulse response measurements. Within the audible bandwidth, the source is repeatable, broadband, and omnidirectional. The applicability of the source was evaluated by measuring the impulse response of a room. The proposed source provides a more accurate temporal and spatial representation of room reflections than conventional loudspeakers due to its omnidirectionality, negligible size and short pulse duration.Peer reviewe
    corecore